XNORBIN: A 95 TOp/s/W Hardware Accelerator for Binary Convolutional Neural Networks
نویسندگان
چکیده
Deploying state-of-the-art CNNs requires power-hungry processors and off-chip memory. This precludes the implementation of CNNs in low-power embedded systems. Recent research shows CNNs sustain extreme quantization, binarizing their weights and intermediate feature maps, thereby saving 8-32× memory and collapsing energy-intensive sum-of-products into XNOR-and-popcount operations. We present XNORBIN, an accelerator for binary CNNs with computation tightly coupled to memory for aggressive data reuse. Implemented in UMC 65nm technology XNORBIN achieves an energy efficiency of 95 TOp/s/W and an area efficiency of 2.0 TOp/s/MGE at 0.8 V.
منابع مشابه
A 7.663-TOPS 8.2-W Energy-efficient FPGA Accelerator for Binary Convolutional Neural Networks
FPGA-based hardware accelerators for convolutional neural networks (CNNs) have obtained great attentions due to their higher energy efficiency than GPUs. However, it is challenging for FPGA-based solutions to achieve a higher throughput than GPU counterparts. In this paper, we demonstrate that FPGA acceleration can be a superior solution in terms of both throughput and energy efficiency when a ...
متن کاملResbinnet: Residual Binary Neural Network
Recent efforts on training light-weight binary neural networks offer promising execution/memory efficiency. This paper introduces ResBinNet, which is a composition of two interlinked methodologies aiming to address the slow convergence speed and limited accuracy of binary convolutional neural networks. The first method, called residual binarization, learns a multi-level binary representation fo...
متن کاملResBinNet: Residual Binary Neural Network
Recent efforts on training light-weight binary neural networks offer promising execution/memory efficiency. This paper introduces ResBinNet, which is a composition of two interlinked methodologies aiming to address the slow convergence speed and limited accuracy of binary convolutional neural networks. The first method, called residual binarization, learns a multi-level binary representation fo...
متن کاملNn-X - a hardware accelerator for convolutional neural networks
Gokhale, Vinayak A. M.S.E.C.E, Purdue University, August 2014. nn-X A Hardware Accelerator for Convolutional Neural Networks. Major Professor: Eugenio Culurciello. Convolutional neural networks (ConvNets) are hierarchical models of the mammalian visual cortex. These models have been increasingly used in computer vision to perform object recognition and full scene understanding. ConvNets consist...
متن کاملTartan: Accelerating Fully-Connected and Convolutional Layers in Deep Learning Networks by Exploiting Numerical Precision Variability
Tartan (TRT), a hardware accelerator for inference with Deep Neural Networks (DNNs), is presented and evaluated on Convolutional Neural Networks. TRT exploits the variable per layer precision requirements of DNNs to deliver execution time that is proportional to the precision p in bits used per layer for convolutional and fully-connected layers. Prior art has demonstrated an accelerator with th...
متن کامل